Concept Space Synset Manager Tool

نویسندگان

  • Apurva Nagvenkar
  • Neha Prabhugaonkar
  • Venkatesh Prabhu
  • Ramdas Karmali
  • Jyoti Pawar
چکیده

The IndoWordNet 1 Consortium consists of member institutions developing WordNet using the expansion approach. The WordNets developed using expansion approach are very much influenced by the source language and may not reflect the richness of the target language (Walawalikar et al., 2010). And therefore the IndoWordNet Community decided to develop concepts which were specific to their respective language viz. language-specific concepts which will help in increasing the WordNet coverage. Besides the above requirement it was also felt that it should be possible to maintain additional information about the concepts i.e. an image, document describing the concept, links to websites and other resources, etc. In this paper, we discuss a Concept Space Synset Management Tool (CSS) 2 which was developed to assist creation of language specific concepts/synsets and manage their linkages to other Indian language WordNets. 1 Background and Motivation The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number 1 http://www.cfilt.iitb.ac.in/indowordnet 2 http://indradhanush.unigoa.ac.in/concep tspace called as synset Id given to each concept (Bhattacharyya, 2010). WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more (Miller, et al., 1993; Miller, 1995; Fellbaum, 1998). Synset (Fellbaum, 1998) is composed of a gloss describing the concept, example sentences and a set of synonym words that are used for the concept. Besides synset data, WordNet maintains many lexical and semantic relations. Table1 gives the number of concepts/synsets created by the language groups of the Indradhanush WordNet Consortium which is a part of the IndoWordNet Consortium. Table1: Synset linkage status Also a sense marked newspaper corpus (sense marking is a task to tag each word of the corpus with the WordNet sense) consisting of minimum 1,00,000 words has been created by each of the members of the Indradhanush WordNet Consortium. The coverage is found to be low. In order to increase the coverage of the WordNet it was decided that a corpus will be created by all language groups and the corpus will be sense marked. To increase the coverage it was decided to add the concepts which were specific to their respective language viz. language-specific concepts and nullify the effect of influence of the source language on the target language WordNet. The CSS Manager Tool 3 was developed to assist in creation of language-specific concepts, linking to other language WordNets, providing additional information about synsets, etc. The features and the detailed framework of the CSS Manager Tool is explained in section 3 and 4. The rest of the paper is organized as follows – section 2 introduces the related work. The features of CSS Manager Tool are presented in section 3; section 4 presents the architecture of CSS Manager Tool. Section 5 presents the implementation details followed by the conclusion and future work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

WordNet, EuroWordNet and Global WordNet

1 WordNet In 1978, George Miller started the development of a database with conceptual relations, as an implementation of a model of the mental lexicon. The database, called WordNet, was organized around the notion of a synset between which semantic relations are expressed. A synset is a set of words with the same part-of-speech that can be interchanged in a certain context. For example, {car; ...

متن کامل

EMMA: Explicit Model Checking Manager (Tool Presentation)

Although model checking is usually described as an automatic technique, the verification process with the use of model checker is far from being fully automatic. In this paper we elaborate on concept of a verification manager, which contributes to automation of the verification process by enabling efficient parallel combination of different verification techniques. We introduce a tool EMMA (Exp...

متن کامل

Collaborative Work on Indonesian WordNet through Asian WordNet (AWN)

This paper describes collaborative work on developing Indonesian WordNet in the AsianWordNet (AWN). We will describe the method to develop for collaborative editing to review and complete the translation of synset. This paper aims to create linkage among Asian languages by adopting the concept of semantic relations and synset expressed in WordNet.

متن کامل

EMMA: Explicit Model Checking Manager

Although model checking is usually described as an automatic technique, the verification process with the use of model checker is far from being fully automatic. In this paper we elaborate on concept of a verification manager, which contributes to automation of the verification process by enabling efficient parallel combination of different verification techniques. We introduce a tool EMMA (Exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014